Goto

Collaborating Authors

 gene and disease


A Systematic Evaluation of Knowledge Graph Embeddings for Gene-Disease Association Prediction

Canastra, Catarina, Pesquita, Cátia

arXiv.org Artificial Intelligence

Discovery gene-disease links is important in biology and medicine areas, enabling disease identification and drug repurposing. Machine learning approaches accelerate this process by leveraging biological knowledge represented in ontologies and the structure of knowledge graphs. Still, many existing works overlook ontologies explicitly representing diseases, missing causal and semantic relationships between them. The gene-disease association problem naturally frames itself as a link prediction task, where embedding algorithms directly predict associations by exploring the structure and properties of the knowledge graph. Some works frame it as a node-pair classification task, combining embedding algorithms with traditional machine learning algorithms. This strategy aligns with the logic of a machine learning pipeline. However, the use of negative examples and the lack of validated gene-disease associations to train embedding models may constrain its effectiveness. This work introduces a novel framework for comparing the performance of link prediction versus node-pair classification tasks, analyses the performance of state of the art gene-disease association approaches, and compares the different order-based formalizations of gene-disease association prediction. It also evaluates the impact of the semantic richness through a disease-specific ontology and additional links between ontologies. The framework involves five steps: data splitting, knowledge graph integration, embedding, modeling and prediction, and method evaluation. Results show that enriching the semantic representation of diseases slightly improves performance, while additional links generate a greater impact. Link prediction methods better explore the semantic richness encoded in knowledge graphs. Although node-pair classification methods identify all true positives, link prediction methods outperform overall.


Applying BioBERT to Extract Germline Gene-Disease Associations for Building a Knowledge Graph from the Biomedical Literature

Gonzalez, Armando D. Diaz, Yue, Songhui, Hayes, Sean T., Hughes, Kevin S.

arXiv.org Artificial Intelligence

Published biomedical information has and continues to rapidly increase. The recent advancements in Natural Language Processing (NLP), have generated considerable interest in automating the extraction, normalization, and representation of biomedical knowledge about entities such as genes and diseases. Our study analyzes germline abstracts in the construction of knowledge graphs of the of the immense work that has been done in this area for genes and diseases. This paper presents SimpleGermKG, an automatic knowledge graph construction approach that connects germline genes and diseases. For the extraction of genes and diseases, we employ BioBERT, a pre-trained BERT model on biomedical corpora. We propose an ontology-based and rule-based algorithm to standardize and disambiguate medical terms. For semantic relationships between articles, genes, and diseases, we implemented a part-whole relation approach to connect each entity with its data source and visualize them in a graph-based knowledge representation. Lastly, we discuss the knowledge graph applications, limitations, and challenges to inspire the future research of germline corpora. Our knowledge graph contains 297 genes, 130 diseases, and 46,747 triples. Graph-based visualizations are used to show the results.


DeepProphet2 -- A Deep Learning Gene Recommendation Engine

Brambilla, Daniele, Giacomini, Davide Maria, Muscarnera, Luca, Mazzoleni, Andrea

arXiv.org Artificial Intelligence

New powerful tools for tackling life science problems have been created by recent advances in machine learning. The purpose of the paper is to discuss the potential advantages of gene recommendation performed by artificial intelligence (AI). Indeed, gene recommendation engines try to solve this problem: if the user is interested in a set of genes, which other genes are likely to be related to the starting set and should be investigated? This task was solved with a custom deep learning recommendation engine, DeepProphet2 (DP2), which is freely available to researchers worldwide via https://www.generecommender.com?utm_source=DeepProphet2_paper&utm_medium=pdf. Hereafter, insights behind the algorithm and its practical applications are illustrated. The gene recommendation problem can be addressed by mapping the genes to a metric space where a distance can be defined to represent the real semantic distance between them. To achieve this objective a transformer-based model has been trained on a well-curated freely available paper corpus, PubMed. The paper describes multiple optimization procedures that were employed to obtain the best bias-variance trade-off, focusing on embedding size and network depth. In this context, the model's ability to discover sets of genes implicated in diseases and pathways was assessed through cross-validation. A simple assumption guided the procedure: the network had no direct knowledge of pathways and diseases but learned genes' similarities and the interactions among them. Moreover, to further investigate the space where the neural network represents genes, the dimensionality of the embedding was reduced, and the results were projected onto a human-comprehensible space. In conclusion, a set of use cases illustrates the algorithm's potential applications in a real word setting.


Neo4j Announces First Graph Machine Learning for the Enterprise

#artificialintelligence

Neo4j, the leader in graph technology, announced the latest version of Neo4j for Graph Data Science, a breakthrough that democratizes advanced graph-based machine learning (ML) techniques by leveraging deep learning and graph convolutional neural networks. Until now, few companies outside of Google and Facebook have had the AI foresight and resources to leverage graph embeddings. This powerful and innovative technique calculates the shape of the surrounding network for each piece of data inside of a graph, enabling far better machine learning predictions. Neo4j for Graph Data Science version 1.4 democratizes these innovations to upend the way enterprises make predictions in diverse scenarios from fraud detection to tracking customer or patient journey, to drug discovery and knowledge graph completion. Neo4j for Graph Data Science version 1.4 is the first and only graph-native machine learning functionality commercially available for enterprises.


Neo4j Announces New Version of Neo4j for Graph Data Science

#artificialintelligence

Neo4j, the leader in graph technology, announced the latest version of Neo4j for Graph Data Science, a breakthrough that democratizes advanced graph-based machine learning (ML) techniques by leveraging deep learning and graph convolutional neural networks. Until now, few companies outside of Google and Facebook have had the AI foresight and resources to leverage graph embeddings. This powerful and innovative technique calculates the shape of the surrounding network for each piece of data inside of a graph, enabling far better machine learning predictions. Neo4j for Graph Data Science version 1.4 democratizes these innovations to upend the way enterprises make predictions in diverse scenarios from fraud detection to tracking customer or patient journey, to drug discovery and knowledge graph completion. Neo4j for Graph Data Science version 1.4 is the first and only graph-native machine learning functionality commercially available for enterprises.


Neo4j Announces First Graph Machine Learning for the Enterprise

#artificialintelligence

Neo4j, the leader in graph technology, announced the latest version of Neo4j for Graph Data Science, a breakthrough that democratizes advanced graph-based machine learning (ML) techniques by leveraging deep learning and graph convolutional neural networks. Until now, few companies outside of Google and Facebook have had the AI foresight and resources to leverage graph embeddings. This powerful and innovative technique calculates the shape of the surrounding network for each piece of data inside of a graph, enabling far better machine learning predictions. Neo4j for Graph Data Science version 1.4 democratizes these innovations to upend the way enterprises make predictions in diverse scenarios from fraud detection to tracking customer or patient journey, to drug discovery and knowledge graph completion. Neo4j for Graph Data Science version 1.4 is the first and only graph-native machine learning functionality commercially available for enterprises.


Linking Genes and Diseases Using AI

#artificialintelligence

Artificial intelligence (AI) is being harnessed by researchers to track down genes that cause disease. A KAUST team is taking a creative, combined deep learning approach that uses data from multiple sources to teach algorithms how to find patterns between genes and diseases. Machine learning uses algorithms and statistical models to identify patterns and associations among data to solve specific problems. By inputting enough known data, like tagged images of "Jack," the system can eventually learn to suggest other nontagged images that include Jack. Researchers are using this application of AI to find genes that cause diseases.


Artificial intelligence learns complex patterns between genes and diseases

#artificialintelligence

Artificial intelligence (AI) is being harnessed by researchers to track down genes that cause disease. A KAUST team is taking a creative, combined deep learning approach that uses data from multiple sources to teach algorithms how to find patterns between genes and diseases. Machine learning uses algorithms and statistical models to identify patterns and associations among data to solve specific problems. By inputting enough known data, like tagged images of "Jack," the system can eventually learn to suggest other nontagged images that include Jack. Researchers are using this application of AI to find genes that cause diseases.


AI learns complex gene-disease patterns

#artificialintelligence

Artificial intelligence (AI) is being harnessed by researchers to track down genes that cause disease. A KAUST team is taking a creative, combined deep learning approach that uses data from multiple sources to teach algorithms how to find patterns between genes and diseases. Machine learning uses algorithms and statistical models to identify patterns and associations among data to solve specific problems. By inputting enough known data, like tagged images of "Jack," the system can eventually learn to suggest other nontagged images that include Jack. Researchers are using this application of AI to find genes that cause diseases.